AITopics | Gwangju

Collaborating Authors

Gwangju

What if Readers Like A.I.-Generated Fiction?

The New YorkerDec-20-2025, 11:00:00 GMT

Finally, he gave the summaries to his fine-tuned model, and he asked it to compose passages "in the style of Vauhini Vara." Going into all this, I was self-assured, even smug. I'd always felt that my style was original and, more important, that my books were totally distinct from one another. I figured that, even if the A.I. model could imitate my past books, it couldn't predict the style of the novel in progress. So, when Chakrabarty sent me the A.I.-generated imitations, I was genuinely confused.

artificial intelligence, large language model, natural language, (20 more...)

The New Yorker

Country:

South America (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Michigan (0.04)
(7 more...)

Genre:

Personal (1.00)
Research Report > New Finding (0.46)

Industry:

Media > News (0.46)
Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)

Add feedback

Advancing Autonomous Driving: DepthSense with Radar and Spatial Attention

Hussain, Muhamamd Ishfaq, Naz, Zubia, Rafique, Muhammad Aasim, Jeon, Moongu

arXiv.org Artificial IntelligenceNov-25-2025

Depth perception is crucial for spatial understanding and has traditionally been achieved through stereoscopic imaging. However, the precision of depth estimation using stereoscopic methods depends on the accurate calibration of binocular vision sensors. Monocular cameras, while more accessible, often suffer from reduced accuracy, especially under challenging imaging conditions. Optical sensors, too, face limitations in adverse environments, leading researchers to explore radar technology as a reliable alternative. Although radar provides coarse but accurate signals, its integration with fine-grained monocular camera data remains underexplored. In this research, we propose DepthSense, a novel radar-assisted monocular depth enhancement approach. DepthSense employs an encoder-decoder architecture, a Radar Residual Network, feature fusion with a spatial attention mechanism, and an ordinal regression layer to deliver precise depth estimations. We conducted extensive experiments on the nuScenes dataset to validate the effectiveness of DepthSense. Our methodology not only surpasses existing approaches in quantitative performance but also reduces parameter complexity and inference times. Our findings demonstrate that DepthSense represents a significant advancement over traditional stereo methods, offering a robust and efficient solution for depth estimation in autonomous driving. By leveraging the complementary strengths of radar and monocular camera data, DepthSense sets a new benchmark in the field, paving the way for more reliable and accurate spatial perception systems.

artificial intelligence, image understanding, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JSEN.2024.3493196

2109.05265

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Asia > South Korea > Gwangju > Gwangju (0.04)
Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (0.61)
Information Technology > Robotics & Automation (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.85)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.82)

Add feedback

A Humanoid Visual-Tactile-Action Dataset for Contact-Rich Manipulation

Kwon, Eunju, Oh, Seungwon, Baek, In-Chang, Park, Yucheon, Kim, Gyungbo, Moon, JaeYoung, Choi, Yunho, Kim, Kyung-Joong

arXiv.org Artificial IntelligenceNov-17-2025

Abstract--Contact-rich manipulation has become increasingly important in robot learning. However, previous studies on robot learning datasets have focused on rigid objects and underrepre-sented the diversity of pressure conditions for real-world manipulation. T o address this gap, we present a humanoid visual-tactile-action dataset designed for manipulating deformable soft objects. The dataset was collected via teleoperation using a humanoid robot equipped with dexterous hands, capturing multi-modal interactions under varying pressure conditions. Contact-rich interaction represents a critical gateway for enabling robots to perform complex tasks in real-world environments, yet it remains one of the fundamental challenges in robotic manipulation [1].

artificial intelligence, manipulation, tactile signal, (12 more...)

arXiv.org Artificial Intelligence

2510.25725

Country: Asia > South Korea > Gwangju > Gwangju (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.35)

Add feedback

Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning

Naz, Zubia, Asghar, Farhan, Hussain, Muhammad Ishfaq, Hadadi, Yahya, Rafique, Muhammad Aasim, Choi, Wookjin, Jeon, Moongu

arXiv.org Artificial IntelligenceNov-14-2025

Automated medical image captioning translates complex radiological images into diagnostic narratives that can support reporting workflows. We present a Swin-BART encoder-decoder system with a lightweight regional attention module that amplifies diagnostically salient regions before cross-attention. Trained and evaluated on ROCO, our model achieves state-of-the-art semantic fidelity while remaining compact and interpretable. We report results as mean$\pm$std over three seeds and include $95\%$ confidence intervals. Compared with baselines, our approach improves ROUGE (proposed 0.603, ResNet-CNN 0.356, BLIP2-OPT 0.255) and BERTScore (proposed 0.807, BLIP2-OPT 0.645, ResNet-CNN 0.623), with competitive BLEU, CIDEr, and METEOR. We further provide ablations (regional attention on/off and token-count sweep), per-modality analysis (CT/MRI/X-ray), paired significance tests, and qualitative heatmaps that visualize the regions driving each description. Decoding uses beam search (beam size $=4$), length penalty $=1.1$, $no\_repeat\_ngram\_size$ $=3$, and max length $=128$. The proposed design yields accurate, clinically phrased captions and transparent regional attributions, supporting safe research use with a human in the loop.

caption, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.09893

Country:

Asia > South Korea > Gwangju > Gwangju (0.05)
Asia > Middle East > Saudi Arabia > Eastern Province > Al-Ahsa Governorate > Al-Hofuf (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)

Genre: Research Report > Experimental Study (0.48)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Deformable Dynamic Convolution for Accurate yet Efficient Spatio-Temporal Traffic Prediction

Jin, Hyeonseok, Kim, Geonmin, Kim, Kyungbaek

arXiv.org Artificial IntelligenceSep-22-2025

Traffic prediction is a critical component of intelligent transportation systems, enabling applications such as congestion mitigation and accident risk prediction. While recent research has explored both graph-based and grid-based approaches, key limitations remain. Graph-based methods effectively capture non-Euclidean spatial structures but often incur high computational overhead, limiting their practicality in large-scale systems. In contrast, grid-based methods, which primarily leverage Convolutional Neural Networks (CNNs), offer greater computational efficiency but struggle to model irregular spatial patterns due to the fixed shape of their filters. Moreover, both approaches often fail to account for inherent spatio-temporal heterogeneity, as they typically apply a shared set of parameters across diverse regions and time periods. To address these challenges, we propose the Deformable Dynamic Convolutional Network (DDCN), a novel CNN-based architecture that integrates both deformable and dynamic convolution operations. The deformable layer introduces learnable offsets to create flexible receptive fields that better align with spatial irregularities, while the dynamic layer generates region-specific filters, allowing the model to adapt to varying spatio-temporal traffic patterns. By combining these two components, DDCN effectively captures both non-Euclidean spatial structures and spatio-temporal heterogeneity. Extensive experiments on four real-world traffic datasets demonstrate that DDCN achieves competitive predictive performance while significantly reducing computational costs, underscoring its potential for large-scale and real-time deployment.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2507.1155

Country:

Asia > South Korea > Gwangju > Gwangju (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (0.88)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Dataset and Benchmark for Robotic Cloth Unfolding Grasp Selection: The ICRA 2024 Cloth Competition

De Gusseme, Victor-Louis, Lips, Thomas, Proesmans, Remko, Hietala, Julius, Lee, Giwan, Choi, Jiyoung, Choi, Jeongil, Kim, Geon, Yonrith, Phayuth, Tabernik, Domen, Gams, Andrej, Nimac, Peter, Urbas, Matej, Muhovič, Jon, Skočaj, Danijel, Mavsar, Matija, Yu, Hyojeong, Kwon, Minseo, Kim, Young J., Cong, Yang, Chen, Ronghan, Ren, Yu, Diao, Supeng, Weng, Jiawei, Liu, Jiayue, Sun, Haoran, Yang, Linhan, Zhang, Zeqing, Guo, Ning, Yang, Lei, Wan, Fang, Song, Chaoyang, Pan, Jia, Jin, Yixiang, A, Yong, Shi, Jun, Li, Dingzhe, Yang, Yong, Yamasaki, Kakeru, Kajiwara, Takumi, Nakadera, Yuki, Saxena, Krati, Shibata, Tomohiro, Xia, Chongkun, Mo, Kai, Yu, Yanzhao, Lin, Qihao, Ma, Binqiang, Sagong, Uihun, Choi, JungHyun, Park, JeongHyun, Lee, Dongwoo, Kim, Yeongmin, Hwang, Myun Joong, Kuribayashi, Yusuke, Hiratsuka, Naoki, Tanaka, Daisuke, Arnold, Solvi, Yamazaki, Kimitoshi, Mateo-Agullo, Carlos, Verleysen, Andreas, Wyffels, Francis

arXiv.org Artificial IntelligenceAug-26-2025

Robotic cloth manipulation suffers from a lack of standardized benchmarks and shared datasets for evaluating and comparing different approaches. To address this, we created a benchmark and organized the ICRA 2024 Cloth Competition, a unique head-to-head evaluation focused on grasp pose selection for in-air robotic cloth unfolding. Eleven diverse teams participated in the competition, utilizing our publicly released dataset of real-world robotic cloth unfolding attempts and a variety of methods to design their unfolding approaches. Afterwards, we also expanded our dataset with 176 competition evaluation trials, resulting in a dataset of 679 unfolding demonstrations across 34 garments. Analysis of the competition results revealed insights about the trade-off between grasp success and coverage, the surprisingly strong achievements of hand-engineered methods and a significant discrepancy between competition performance and prior work, underscoring the importance of independent, out-of-the-lab evaluation in robotic cloth manipulation. The associated dataset is a valuable resource for developing and evaluating grasp selection methods, particularly for learning-based approaches. We hope that our benchmark, dataset and competition results can serve as a foundation for future benchmarks and drive further progress in data-driven robotic cloth manipulation. The dataset and benchmarking code are available at https://airo.ugent.be/cloth_competition.

artificial intelligence, machine learning, survey article, (20 more...)

arXiv.org Artificial Intelligence

2508.16749

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > China > Hong Kong (0.05)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
(13 more...)

Genre:

Research Report (1.00)
Overview (0.93)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)

Add feedback

Recovering Plasticity of Neural Networks via Soft Weight Rescaling

Oh, Seungwon, Park, Sangyeon, Han, Isaac, Kim, Kyung-Joong

arXiv.org Artificial IntelligenceJul-8-2025

Recent studies have shown that as training progresses, neural networks gradually lose their capacity to learn new information, a phenomenon known as plasticity loss. An unbounded weight growth is one of the main causes of plasticity loss. Furthermore, it harms generalization capability and disrupts optimization dynamics. Re-initializing the network can be a solution, but it results in the loss of learned information, leading to performance drops. In this paper, we propose Soft Weight Rescaling (SWR), a novel approach that prevents unbounded weight growth without losing information. SWR recovers the plasticity of the network by simply scaling down the weight at each step of the learning process. We theoretically prove that SWR bounds weight magnitude and balances weight magnitude between layers. Our experiment shows that SWR improves performance on warm-start learning, continual learning, and single-task learning setups on standard image classification benchmarks. Recent works have revealed that a neural network loses its ability to learn new data as training progresses, a phenomenon known as plasticity loss. A pre-trained neural network shows inferior performance compared to a newly initialized model when trained on the same data (Ash & Adams, 2020; Berariu et al., 2021). Lyle et al. (2024b) demonstrated that unbounded weight growth is one of the main causes of plasticity loss and suggested weight decay and layer normalization as solutions.

artificial intelligence, machine learning, neural network, (12 more...)

arXiv.org Artificial Intelligence

2507.04683

Country:

Asia > South Korea > Gwangju > Gwangju (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer-Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs

Parida, Bikram Keshari, Sunilkumar, Anusree P., Sen, Abhijit, You, Wonsang

arXiv.org Artificial IntelligenceJun-17-2025

Dental diagnosis relies on two primary imaging modalities: panoramic radiographs (PX) providing 2D oral cavity representations, and Cone-Beam Computed Tomography (CBCT) offering detailed 3D anatomical information. While PX images are cost-effective and accessible, their lack of depth information limits diagnostic accuracy. CBCT addresses this but presents drawbacks including higher costs, increased radiation exposure, and limited accessibility. Existing reconstruction models further complicate the process by requiring CBCT flattening or prior dental arch information, often unavailable clinically. We introduce ViT-NeBLa, a vision transformer-based Neural Beer-Lambert model enabling accurate 3D reconstruction directly from single PX. Our key innovations include: (1) enhancing the NeBLa framework with Vision Transformers for improved reconstruction capabilities without requiring CBCT flattening or prior dental arch information, (2) implementing a novel horseshoe-shaped point sampling strategy with non-intersecting rays that eliminates intermediate density aggregation required by existing models due to intersecting rays, reducing sampling point computations by $52 \%$, (3) replacing CNN-based U-Net with a hybrid ViT-CNN architecture for superior global and local feature extraction, and (4) implementing learnable hash positional encoding for better higher-dimensional representation of 3D sample points compared to existing Fourier-based dense positional encoding. Experiments demonstrate that ViT-NeBLa significantly outperforms prior state-of-the-art methods both quantitatively and qualitatively, offering a cost-effective, radiation-efficient alternative for enhanced dental diagnostics.

artificial intelligence, machine learning, reconstruction, (18 more...)

arXiv.org Artificial Intelligence

2506.13195

Country:

Europe > Switzerland (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting

Nguyen, Minh-Duc, Yang, Hyung-Jeong, Kim, Soo-Hyung, Shin, Ji-Eun, Kim, Seung-Won

arXiv.org Artificial IntelligenceMay-14-2025

The dyadic reaction generation task involves synthesizing responsive facial reactions that align closely with the behaviors of a conversational partner, enhancing the naturalness and effectiveness of human-like interaction simulations. This paper introduces a novel approach, the Latent Behavior Diffusion Model, comprising a context-aware autoencoder and a diffusion-based conditional generator that addresses the challenge of generating diverse and contextually relevant facial reactions from input speaker behaviors. The autoencoder compresses high-dimensional input features, capturing dynamic patterns in listener reactions while condensing complex input data into a concise latent representation, facilitating more expressive and contextually appropriate reaction synthesis. The diffusion-based conditional generator operates on the latent space generated by the autoencoder to predict realistic facial reactions in a non-autoregressive manner. This approach allows for generating diverse facial reactions that reflect subtle variations in conversational cues and emotional states. Experimental results demonstrate the effectiveness of our approach in achieving superior performance in dyadic reaction synthesis tasks compared to existing methods.

artificial intelligence, machine learning, reaction, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-78389-0_16

2505.07901

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > South Korea > Gwangju > Gwangju (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Challenges and Trends in Egocentric Vision: A Survey

Li, Xiang, Qiu, Heqian, Wang, Lanxiao, Zhang, Hanwen, Qi, Chenghao, Han, Linfeng, Xiong, Huiyu, Li, Hongliang

arXiv.org Artificial IntelligenceMar-19-2025

With the rapid development of artificial intelligence technologies and wearable devices, egocentric vision understanding has emerged as a new and challenging research direction, gradually attracting widespread attention from both academia and industry. Egocentric vision captures visual and multimodal data through cameras or sensors worn on the human body, offering a unique perspective that simulates human visual experiences. This paper provides a comprehensive survey of the research on egocentric vision understanding, systematically analyzing the components of egocentric scenes and categorizing the tasks into four main areas: subject understanding, object understanding, environment understanding, and hybrid understanding. We explore in detail the sub-tasks within each category. We also summarize the main challenges and trends currently existing in the field. Furthermore, this paper presents an overview of high-quality egocentric vision datasets, offering valuable resources for future research. By summarizing the latest advancements, we anticipate the broad applications of egocentric vision technologies in fields such as augmented reality, virtual reality, and embodied intelligence, and propose future research directions based on the latest developments in the field.

data mining, large language model, machine learning, (24 more...)

arXiv.org Artificial Intelligence

2503.15275

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
(13 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Leisure & Entertainment (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.67)
(3 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Hardware (1.00)
(10 more...)

Add feedback